16 research outputs found

    BioNLP Shared Task 2011 - Bacteria Gene Interactions and Renaming

    No full text
    Document Type : Proceedings Paper Conference Date : JUN 23-24, 2011 Conference Location : Portland, ORInternational audienceWe present two related tasks of the BioNLP Shared Tasks 2011: Bacteria Gene Renaming (Rename) and Bacteria Gene Interactions (GI). We detail the objectives, the corpus specification, the evaluation metrics, and we summarize the participants' results. Both issued from PubMed scientific literature abstracts, the Rename task aims at extracting gene name synonyms, and the GI task aims at extracting genic interaction events, mainly about gene transcriptional regulations in bacteria

    BioNLP Shared Task - The Bacteria Track

    Get PDF
    Background: We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions. Results: Three teams submitted to the Bacteria Gene Renaming task; the best team achieved an F-score of 87%. For the Bacteria Gene Interaction task, the only participant's score had reached a global F-score of 77%, although the system efficiency varies significantly from one sub-type to another. Three teams submitted to the Bacteria Biotopes task with very different approaches; the best team achieved an F-score of 45%. However, the detailed study of the participating systems efficiency reveals the strengths and weaknesses of each participating system. Conclusions: The three tasks of the Bacteria Track offer participants a chance to address a wide range of issues in Information Extraction, including entity recognition, semantic typing and coreference resolution. We found commond trends in the most efficient systems: the systematic use of syntactic dependencies and machine learning. Nevertheless, the originality of the Bacteria Biotopes task encouraged the use of interesting novel methods and techniques, such as term compositionality, scopes wider than the sentence

    Application de l’apprentissage à l’extraction de connaissances à partir de notices bibliographiques en génomique

    No full text
    Diplôme : Dr. d'UniversitéNotre objectif est l’annotation sémantique automatique du texte, c’est à dire l’explicitation formelle de son sens. Nous nous appuyons sur l’Extraction d’Information, dont l’objectif est d’extraire du texte un type précis d’information sous forme structurée à l’aide d’un ensemble de règles. Ces règles seront acquises à l’aide de techniques d’apprentissage artificiel. Nous nous sommes intéressés au domaine de la génomique, dont la littérature est particulièrement complexe à traiter automatiquement. De fait, les méthodes de l’état de l’art se basent sur une analyse profonde du texte et sur des règles d’extraction faisant usage d’attributs syntaxiques et sémantiques. Ces règles sont généralement conçues manuellement, et nous avons démontré qu’il était possible de les acquérir automatiquement à partir d’exemples annotées. Nous proposons une méthodologie où l’ontologie (le modèle formel du domaine) est au coeur du processus d’annotation, que ce soit pour l’annotation experte, l’annotation sémantique automatique, ou la définition de la représentation du texte pour l’apprentissage. Cette dernière est effectuée déclarativement, en explicitant une « surcouche lexicale » de l’ontologie liant le niveau conceptuel au niveau lexical. Cette approche est d’une grande généricité et permet de tester aisément de multiples représentations. Nous l’avons validée sur le problème de l’extraction des interactions géniques, qui correspond à une demande forte de la communauté biologique. Pour faciliter l’analyse profonde du texte, nous filtrons les documents non pertinents à l’aide de méthodes d’apprentissage exploitant une analyse superficielle du texte. Nos résultats sont de bonne qualité comparativement à d’autres approches

    Information extraction as an ontology population task and Its application to genic interactions

    No full text
    International audienceOntologies are a well-motivated formal representation to model knowledge needed to extract and encode data from text. Yet, their tight integration with Information Extraction (IE) systems is still a research issue, a fortiori with complex ones that go beyond hierarchies. In this paper we introduce an original architecture where IE is specified by designing an ontology, and the extraction process is seen as an Ontology Population (OP) task. Concepts and relations of the ontology define a normalized text representation. As their abstraction level is irrelevant for text extraction, we introduced a Lexical Layer (LL) along with the ontology, i.e. relations and classes at an intermediate level of normalization between raw text and concepts. On the contrary to previous IE systems, the extraction process only involves normalizing the outputs of Natural Language Processing (NLP) modules with instances of the ontology and the LL. All the remaining reasoning is left to a query module, which uses the inference rules of the ontology to derive new instances by deduction. In this context, these inference rules subsume classical extraction rules or patterns by providing access to appropriate abstraction level and domain knowledge. To acquire those rules, we adopt an Ontology Learning (OL) perspective, and automatically acquire the inference rules with relational Machine Learning (ML). Our approach is validated on a genic interaction extraction task from a Bacillus subtilis bacterium text corpus. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten conceptual relations in the ontolog

    Genic interaction extraction by reasoning on an ontology

    No full text
    International audienceInformation Extraction (IE) systems have been proposed in recent years, to extract genic interactions from bibliographical resources. But they are limited to single interaction relations, and have to face a tradeoff between recall and precision, by focusing either on specific interactions (for precision), or general and unspecified interactions of biological entities (for recall). Yet, biologists need to process more complex data from literature, in order to study biological pathways, so an ontology is an adequate formal representation to model this sophisticated knowledge. But the tight integration of IE systems and ontologies is still a current research issue, a fortiori with complex ones that go beyond hierarchies. Here, we propose a rich modeling of genic interactions with an ontology, and show how it can be used within an IE system. The ontology is seen as a language specifying a normalized representation of text. IE is performed by first extracting instances from Natural Language Processing (NLP) modules, then deductive inferences on the ontology language are completed. New instances may be infered, bringing together otherwise scattered textual information. We validated our approach on an annotated corpus of gene transcription regulations in Bacillus subtilis. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten semantic relations defined in the ontolog

    Learning ontological rules to extract multiple relations of genic interactions from text

    No full text
    International audienceIntroduction: Information extraction (IE) systems have been proposed in recent years to extract genic interactions from bibliographical resources. They are limited to single interaction relations, and have to face a trade-off between recall and precision, by focusing either on specific interactions (for precision), or general and unspecified interactions of biological entities (for recall). Yet, biologists need to process more complex data from literature, in order to study biological pathways. An ontology is an adequate formal representation to model this sophisticated knowledge. However, the tight integration of IE systems and ontologies is still a current research issue, a fortiori with complex ones that go beyond hierarchies. Method: We propose a rich modeling of genic interactions with an ontology, and show how it can be used within an IE system. The ontology is seen as a language specifying a normalized representation of text. First, IE is performed by extracting instances from natural language processing (NLP) modules. Then, deductive inferences on the ontology language are completed, and new instances are derived from previously extracted ones. Inference rules are learnt with an inductive logic programming (ILP) algorithm, using the ontology as the hypothesis language, and its instantiation on an annotated corpus as the example language. Learning is set in a multi-class setting to deal with the multiple ontological relations. Results: We validated our approach on an annotated corpus of gene transcription regulations in the Bacillus subtilis bacterium. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten semantic relations defined in the ontology
    corecore